Goto

Collaborating Authors

 flow model


Distilling LLMPrior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

Neural Information Processing Systems

The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched fullscene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.


Greed is Good: AUnifying Perspective on Guided Generation

Neural Information Processing Systems

Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance by projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at the posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical understanding of these two techniques relative to the continuous ideal gradients. Motivated by this analysis, we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients.


Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models

Neural Information Processing Systems

Training-free guidance enables controlled generation in diffusion and flow models, but most methods rely on gradients and assume differentiable objectives. This work focuses on training-free guidance addressing challenges from non-differentiable objectives and discrete data distributions. We propose TreeG: Tree Search-Based Path Steering Guidance, applicable to both continuous and discrete settings in diffusion and flow models. TreeG offers a unified framework for training-free guidance by proposing, evaluating, and selecting candidates at each step, enhanced with tree search over active paths and parallel exploration. We comprehensively investigate the design space of TreeG over the candidate proposal module and the evaluation function, instantiating TreeG into three novel algorithms. Our experiments show that TreeG consistently outperforms top guidance baselines in symbolic music generation, small molecule design, and enhancer DNA design with improvements of 29.01%,26.38%,and


Wukong's 72 Transformations: High-fidelity Textured 3DMorphing via Flow Models

Neural Information Processing Systems

We present WUKONG, a novel training-free framework for high-fidelity textured 3D morphing that takes a pair of source and target prompts (image or text) as input. Unlike conventional methods--which rely on manual correspondence matching and deformation trajectory estimation (limiting generalization and requiring costly preprocessing)--WUKONG leverages the generative prior of flow-based transformers to produce high-fidelity 3D transitions with rich texture details. To ensure smooth shape transitions, we exploit the inherent continuity of flow-based generative processes and formulate morphing as an optimal transport barycenter problem. We further introduce a sequential initialization strategy to prevent abrupt geometric distortions and preserve identity coherence. For faithful texture preservation, we propose a similarity-guided semantic consistency mechanism that selectively retains high-frequency details and enables precise control over blending dynamics. This empowers WUKONG to support both global texture transitions and identitypreserving texture morphing, catering to diverse generation needs. Extensive quantitative and qualitative evaluations demonstrate that WUKONG significantly outper-Corresponding author.


Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Neural Information Processing Systems

We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models--offering faster generation and high-quality outputs--efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation and variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.


Supplementary Material AProof of Proposition 2

Neural Information Processing Systems

Proposition 2. (From main text) The Bayes error of flow models is monotonically increasing in . That is, for 0 < 0, we have that EBayes(ห†p) EBayes(ห†p 0). B.1 Hardness of Classes In addition to measuring the difficulty of classification tasks relative to one another, it also may be of interest to evaluate the relative difficulty of individual classes within a particular task. A natural way to do this is by looking at the error of one-vs-all classification tasks. The optimal Bayes classifier in this task is CBayes(x)= 0 if logpj(x) logp j(x), 1 otherwise .


Evaluating State-of-the-Art Classification Models Against Bayes Optimality

Neural Information Processing Systems

Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the Bayes error, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets.


Self-Supervised Motion Magnification by Backpropagating Through Optical Flow

Neural Information Processing Systems

This paper presents a simple, self-supervised method for magnifying subtle motions in video: given an input video and a magnification factor, we manipulate the video such that its new optical flow is scaled by the desired amount. To train our model, we propose a loss function that estimates the optical flow of the generated video and penalizes how far if deviates from the given magnification factor. Thus, training involves differentiating through a pretrained optical flow network. Since our model is self-supervised, we can further improve its performance through test-time adaptation, by finetuning it on the input video. It can also be easily extended to magnify the motions of only user-selected objects. Our approach avoids the need for synthetic magnification datasets that have been used to train prior learning-based approaches.


Discrete Flow Matching

Neural Information Processing Systems

Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions: (i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser ($x$-prediction) and noise-prediction ($\epsilon$-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models.